Discovery of Diagnostic Patterns from Protein Sequence Databases

نویسندگان

  • Björn Olsson
  • Kim Laurio
چکیده

We show how prior domain knowledge can be used in a system for mining databases of biological data. Our system performs automated discovery of diagnostic patterns from a database of protein sequences. Such patterns are used for classiication of new sequences, and identiication of biologically interesting positions in the proteins. The patterns have a simple syntax and can be translated into regular expressions , which can be used for rapid scanning of databases. Current pattern libraries are built semi-manually, since the correctness of the pattern depends on the incorporation of domain knowledge. Due to the dramatic growth of the databases it is desirable to automate this process. Our results show that the patterns derived by our fully automated system compete well with the semi-manually constructed patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sequence-Structure Patterns: Discovery and Applications

Protein sequence data is being generated at a tremendous rate; however, functional annotation of these proteins is proceeding at a much slower pace. Biologists rely on computational biology and pattern recognition to predict the functionality of proteins. This is based on the fact that proteins that share a similar function often exhibit conserved sequence patterns. Such sequence patterns, or m...

متن کامل

Use of Peptide library screening to detect a previously unknown linear diagnostic epitope: proof of principle by use of lyme disease sera.

Diagnostic peptides previously isolated from phage-displayed libraries by affinity selection with serum antibodies from patients with Lyme disease were found to give reproducible serum reactivity patterns when tested in two different enzyme-linked immunosorbent assay formats. In addition, the hypothetical possibility that peptides selected by this type of "epitope discovery" technique might ide...

متن کامل

Protein Databases

Proteins are sources of many peptides with diverse biological activity. Some of them are considered as valuable components of foods and drug targets with desired and designed biological activity. We are now entering an era rich in biological data in which the field of bioinformatics is poised to exploit this information in increasingly powerful ways. There are currently many databases all over ...

متن کامل

Data Mining and Knowledge Discovery in Molecular Databases - Session Introduction

The development and growth of molecular databases over the last decade has brought a growing problem to the biocomputing community. Our ability t o analyze, summarize and extract information from these databases has lagged far behind our ability to collect and store data. As well, traditional methods for handling data either automated or manual cannot be eeectively applied because of the volume...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998